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Abstract 

Although  several  approaches  have  been  developed  for 
planning  in  nondeterministic  domains,  solving  large 
planning  problems  is  still  quite  difficult.  In  this  work, 
we  present  a  novel  algorithm,  called  YoYo,  for  planning 
in  nondeterministic  domains  under  the  assumption  of 
full  observability.  This  algorithm  enables  us  to  combine 
the  power  of  search-control  strategies  as  in  Planning 
with  Hierarchical  Task  Networks  (HTNs)  with  tech¬ 
niques  from  the  Planning  via  Symbolic  Model- Checking 
(SMC).  Our  experimental  evaluation  confirms  the  po¬ 
tentialities  of  our  approach,  demonstrating  that  it  com¬ 
bines  the  advantages  of  these  paradigms. 

Introduction 

More  and  more  research  is  addressing  the  problem  of 
planning  in  nondeterministic  domains.  In  spite  of  the 
recent  promising  results,  the  problem  is  still  very  hard 
to  solve  in  practice,  even  under  the  simplifying  assump¬ 
tion  of  full  observability,  i.e.,  the  hypothesis  that  the 
state  of  the  world  can  be  completely  observed  at  run¬ 
time.  Indeed,  in  the  case  of  nondeterministic  domains, 
the  planning  algorithm  must  reason  about  all  possi¬ 
ble  different  execution  paths  to  find  a  plan  that  works 
despite  the  nondeterminism,  and  the  dimension  of  the 
generated  conditional  plan  may  grow  exponentially. 

Among  others,  planning  based  on  Symbolic  Model 
Checking  (Cimatti  et  al.  2003;  Rintanen  2002;  Jensen 
&  Veloso  2000;  Cimatti,  Roveri,  &  Traverso  1998)  is 
one  of  the  most  promising  approaches  for  planning  un¬ 
der  conditions  of  nondeterminism.  This  technique  relies 
on  the  usage  of  propositional  formulas  for  a  compact 
representation  of  sets  of  states,  and  of  transformations 
over  such  formulas  for  efficient  exploration  in  the  search 
space.  The  most  common  implementations  of  plan¬ 
ning  based  on  symbolic  model  checking  have  been  re¬ 
alized  with  Binary  Decision  Diagrams  (BDDs)  (Bryant 
1992),  data  structures  that  are  well-suited  to  compactly 
represent  propositional  formulas  and  to  efficiently  com¬ 
pute  their  transformations.  In  different  experimental 
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settings,  planning  algorithms  based  on  symbolic  model 
checking  and  BDDs,  e.g.,  those  implemented  in  MBP 
(Bertoli  et  al.  2001a),  have  been  shown  to  scale  up  to 
rather  large-sized  problems  (Cimatti  et  al.  2003). 

Another  promising  approach  to  planning  with  non¬ 
determinism  is  forward-chaining  planning  with  Hier¬ 
archical  Task  Networks  (HTNs),  which  was  originally 
developed  to  provide  efficient  search-control  heuristics 
for  classical  deterministic  domains  (Nau  et  al.  2003). 
(Kuter  &  Nau  2004)  describes  a  way  to  generalize  this 
approach  to  work  in  the  nondeterministic  case,  along 
with  a  class  of  other  forward-chaining  planning  tech¬ 
niques.  The  ND-SHOP2  planner,  a  nondeterminiza- 
tion  of  SHOP2  (Nau  et  al.  2003),  is  a  forward-chaining 
HTN  planner  developed  based  on  this  technique.  Like 
its  predecessor,  ND-SHOP2  has  the  ability  to  exploit 
expressive  domain-specific  search-control  heuristics  to 
guide  its  search  for  solutions.  It  has  been  demon¬ 
strated  in  (Kuter  &  Nau  2004)  that  ND-SHOP2  can 
be  very  effective  in  pruning  the  search  space,  and  in 
some  experiments  ND-SHOP2  outperforms  MBP.  Un¬ 
fortunately,  ND-SHOP2  cannot  efficiently  solve  prob¬ 
lems  where  strategies  cannot  cut  down  the  search  space, 
since  it  does  not  have  MB  P’s  ability  to  work  with  sym¬ 
bolic  representations  of  abstract  collections  of  states. 

In  this  paper,  we  have  devised  a  formalism  and  de¬ 
veloped  a  novel  algorithm,  called  YoYo,  that  enables  us 
to  combine  the  power  of  the  HTN-based  search-control 
strategies  with  BDD-based  symbolic  model  checking 
techniques.  YoYo  implements  an  HTN-based  forward¬ 
chaining  search  as  in  ND-SHOP2,  built  on  top  of  sym¬ 
bolic  model-checking  primitives  based  on  BDDs  as  in 
MBP.  This  combination  has  required  a  complete  re¬ 
thinking  of  the  ND-SHOP2  algorithm,  in  order  to  take 
advantage  of  situations  where  the  BDD  representation 
will  allow  it  to  avoid  enumerating  states  explicitly. 

We  have  performed  an  experimental  comparison  of 
YoYo  with  MBP  and  ND-SHOP2.  The  results  confirm 
the  advantage  of  combining  search-control  heuristics 
with  symbolic  model  checking:  YoYo’s  BDD  represen¬ 
tation  enabled  it  to  represent  large  problems  compactly 
while  exploiting  HTN  search-control  strategies  to  prune 
large  parts  of  the  search  space.  YoYo  easily  outper¬ 
formed  both  MBP  and  ND-SHOP2  in  all  cases,  and  it 
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could  deal  with  problem  sizes  that  neither  MBP  nor 
ND-SHOP2  could  scale  up  to. 

The  paper  is  organized  as  follows.  We  first  ex¬ 
plain  more  in  detail  the  reasons  why  the  integration  of 
forward-chaining  HTN  planning  with  symbolic  model¬ 
checking  techniques  should  provide  important  advan¬ 
tages.  We  do  this  with  the  help  of  a  well-known  exam¬ 
ple,  the  Hunter-Prey  domain  (Koenig  &  Simmons  1995). 
Then,  we  present  our  formal  setting,  the  YoYo  planning 
algorithm,  and  its  implementation  using  BDDs.  Next, 
we  present  an  experimental  analysis  of  our  approach 
and  discuss  our  results.  Finally,  we  conclude  with  our 
future  research  directions. 

Motivations 

We  have  identified  two  promising  approaches  for  plan¬ 
ning  in  nondeterministic  domains;  namely,  planning 
with  Symbolic  Model  Checking  (SMC)  and  forward¬ 
chaining  planning  with  Hierarchical  Task  Networks 
(HTNs).  These  two  planning  techniques  have  comple¬ 
mentary  advantages:  the  former  can  exploit  very  ef¬ 
ficient  search-control  heuristics  for  pruning  the  search 
space,  and  the  latter  uses  propositional  formulas  for  a 
compact  representation  of  sets  of  states,  and  of  trans¬ 
formations  over  such  formulas  for  efficient  exploration 
in  the  search  space.  Thus,  it  is  reasonable  to  assume 
that  the  planning  techniques  developed  using  these  two 
planning  approaches  perform  well  on  different  kinds  of 
planning  problems  and  domains.  It  is  not  hard  to  find 
planning  domains  that  verifies  this  assumption:  one  ex¬ 
ample  is  the  well-known  Hunter-Prey  domain,  which  was 
first  introduced  in  (Koenig  &  Simmons  1995). 

In  the  Hunter-Prey  domain,  there  is  a  hunter  and  a 
prey  in  an  n  x  n  grid  world.  The  task  of  the  hunter 
is  to  catch  the  prey  in  the  world.  The  hunter  has  five 
possible  actions;  namely,  north,  south,  east,  west,  and 
catch.  The  prey  has  also  five  actions:  it  has  the  same 
four  moves  as  the  hunter,  and  an  action  to  stay  still  in 
the  world.  The  hunter  can  catch  the  prey  only  when 
the  hunter  and  the  prey  are  at  the  same  location  at  the 
same  time  in  the  world.  The  nondeterminism  for  the 
hunter  is  introduced  through  the  actions  of  the  prey: 
at  any  time,  it  can  take  any  of  its  actions,  independent 
from  the  hunter’s  move. 

We  have  experimented  with  two  state-of-the-art  plan¬ 
ners  designed  to  work  in  nondeterministic  planning  do¬ 
mains,  namely  ND-SH0P2  and  MBP.  ND-SH0P2  is 
a  forward-chaining  HTN-planning  algorithm  generated 
by  the  “nondeterminization”  technique  for  (Kuter  & 
Nau  2004),  and  MBP,  on  the  other  hand,  is  a  plan¬ 
ning  system  designed  for  exploiting  representations  and 
planning  techniques  based  on  symbolic  model  checking 
and  BDDs  (Bertoli  et  al.  2001a). 

Figure  1  shows  the  average  running  times  required  by 
MBP  and  ND-SHOP2,  as  a  function  of  increasing  grid 
sizes.  These  results  are  obtained  by  running  the  two 
planners  over  20  randomly-generated  problems  for  each 
grid  size,  and  then,  by  averaging  the  results. 


Figure  1:  Average  running  times  in  sec.’s  for  MBP  and 
ND-SHOP2  in  the  Hunter-Prey  Domain  as  a  function  of 
the  grid  size,  with  one  prey.  ND-SHOP2  was  not  able 
to  solve  planning  problems  in  grids  larger  than  10  x  10 
due  to  memory-overflow  problems. 


Figure  2:  Average  running  times  in  sec.’s  for  MBP  and 
ND-SHOP2  in  the  Hunter-Prey  Domain  as  a  function 
of  the  number  of  preys,  with  a  fixed  3x3  grid. 


ND-SHOP2  ran  out  of  memory  in  the  large  problems 
of  this  domain  because  of  the  following:  (1)  the  solu¬ 
tions  for  the  problems  in  this  domain  are  very  large  to 
store  using  an  explicit  representation,  and  (2)  the  search 
space  does  not  admit  a  structure  that  can  be  exploited 
by  search-control  heuristics.  Note  that  this  domain  al¬ 
lows  only  for  high-level  strategies  for  the  hunter  such 
as  ’’look  at  the  prey  and  move  towards  it,”  since  the 
hunter  does  not  know  which  actions  the  prey  will  take 
at  a  particular  time.  MBP,  on  the  other  hand,  clearly 
outperforms  ND-SHOP2  in  these  experiments,  demon¬ 
strating  the  advantage  of  using  BDD-basecl  representa¬ 
tions  over  explicit  ones. 

To  test  the  effectiveness  of  the  search-control  heuris¬ 
tics,  we  have  created  a  variation  of  the  domain  in  which 
we  may  have  more  than  one  prey  to  catch.  We  made 
the  movements  of  preys  dependent  on  each  other  by 


assuming  that  a  prey  cannot  move  to  a  location  next 
to  another  prey  in  the  world.  Figure  2  shows  the 
results  in  this  adapted  domain,  with  the  3x3  grid 
world:  ND-SHOP2  is  able  to  outperform  MBP  in  this 
domain.  The  reason  for  the  difference  in  these  results 
compared  to  the  previous  ones  is  that  this  adapted 
domain  allows  much  more  powerful  strategies  for  the 
hunter:  e.g.,  “choose  one  prey  and  chase  it  while  ig¬ 
noring  others;  when  you  catch  that  prey,  choose  an¬ 
other  and  chase  it,  and  continue  in  this  way  until  all 
of  the  preys  are  caught.”  ND-SH0P2,  using  this  strat¬ 
egy,  is  able  to  avoid  the  combinatorial  explosion  due 
the  nondeterminism  in  the  world.  On  the  other  hand, 
the  BDD-based  representations  in  MBP  explode  in  size 
since  the  movements  of  the  preys  are  dependent  to 
each  other,  and  MBP’s  backward-chaining  breadth-first 
search  techniques  apparently  cannot  compansate  for 
such  an  explosion. 

The  two  experiments  above  clearly  show  that  plan¬ 
ning  using  HTN-basecl  search-control  heuristics  and 
BDD-based  compact  representations  have  complemen¬ 
tary  advantages,  and  they  demonstrate  the  improve¬ 
ments  in  the  efficiency  of  planning  that  can  be  achieved 
when  these  two  techniques  are  combined  in  a  single 
planning  framework.  In  this  paper,  we  present  one  such 
framework  that  is  built  on  forward-chaining  HTN  plan¬ 
ning  over  symbolic  model-checking  primitives,  and  a 
planning  algorithm  that  works  in  that  framework. 

Background 

We  use  the  usual  definitions  for  nondeterministic  plan¬ 
ning  domains  and  planning  problems  in  such  domains 
as  in  (Cimatti  et  al.  2003).  A  nondeterministic  plan¬ 
ning  domain  is  a  tuple  of  the  form  ( V,S,A,TZ ),  where 
V  is  a  finite  set  of  propositions,  S  C  2V  is  the  set  of  all 
possible  states,  A  is  the  finite  set  of  all  possible  actions, 
and  IZCSxAxS  is  the  state-transition  relation.  The 
set  of  successor  states  generated  when  an  action  a  is 
applied  in  s  is  j(s,  a)  =  {s'  |  (s,  a,  s')  £  TZj;  we  say  a  is 
not  applicable  in  s,  if  j(s,a)  =  0.  The  set  of  states  in 
which  an  action  a  £  A  can  be  applied  is  Sa  C  S. 

For  instance,  consider  the  Hunter-Prey  domain  de¬ 
scribed  in  the  previous  section.  In  this  domain,  a  state 
s  £  S  describe  the  possible  positions  of  the  hunter  and 
the  prey  on  the  grid.  The  actions  in  A  describe  the  pos¬ 
sible  moves  of  the  hunter  as  well  as  the  act  of  catching  a 
prey.  The  actions  of  the  prey  are  modeled  through  the 
effects  of  the  hunter’s  actions;  i.e. ,  they  are  described 
in  the  transtion  relation  1Z. 

We  define  a  policy  to  be  a  set  tt  =  {(s,a)  |  s  £ 
S  and  a  £  A(s)},  where  A(s)  C  A  is  the  set  of  actions 
that  are  applicable  in  s.  The  set  of  states  in  a  policy  is 
Sn  =  {s  |  (s,a)  £  tt}.  An  execution  structure  induced 
by  the  policy  tt  is  a  directed  graph  =  (IA,  En):  TA 
is  the  set  of  the  nodes  of  ,  which  represent  the  states 
that  can  be  generated  by  executing  actions  in  tt.  E „  is 
the  set  of  arcs  between  the  nodes  of  ET,  which  represent 
possible  state  transitions  caused  by  the  actions  in  tt. 


A  planning  problem  in  a  nondeterministic  planning 
domain  D  =  (V,S,A,TZ)  is  a  tuple  of  the  form  P  = 
( D,I,G ),  where  /  C  S  is  a  set  of  initial  states,  and 
G  C  S  is  a  set  of  goal  states.  In  this  paper,  we  focused 
on  only  strong  and  strong-cyclic  solutions  for  planning 
problems.  We  summarize  their  definitions  here;  for  a 
detailed  discussion,  see  (Cimatti  et  al.  2003): 

•  A  strong  solution  is  a  policy  that  is  guaranteed  to 
achieve  the  goals  of  the  problem,  despite  the  nonde¬ 
terminism  in  the  domain.  That  is,  a  policy  tt  is  a 
strong  solution  if  (1)  every  finite  path  in  the  execu¬ 
tion  structure  reaches  to  a  final  node  that  satisfies 
the  goals,  and  (2)  there  are  no  infinite  paths  in 

—  i.e.,  is  acyclic. 

•  A  strong-cyclic  solution  is  a  policy  that  is  guaranteed 
to  reach  the  goals  under  a  “fairness  assumption;”  i.e., 
the  assumption  that  the  execution  of  a  strong-cyclic 
solution  will  eventually  exit  the  loops.  More  specifi¬ 
cally,  in  a  strong-cyclic  solution  tt,  every  partial  path 
in  the  execution  structure  can  be  extended  to  a 
finite  execution  path  that  reaches  to  a  goal. 

We  use  the  usual  definitions  for  primitive  tasks,  non¬ 
primitive  tasks,  task  networks,  and  methods  as  in  (Nau 
et  al.  2003) ,  except  that  we  restrict  ourselves  to  ground 
instances  of  these  constructs  in  this  paper.  We  as¬ 
sume  the  existence  of  a  finite  set  of  symbols  that  de¬ 
note  the  tasks  to  be  performed  in  a  planning  domain 
D  =  (V,S,A,TZ).  Every  action  in  A  is  a  task  symbol, 
and  there  are  some  additional  task  symbols  called  non¬ 
primitive  tasks.  A  task  network  is  a  partially-ordered 
set  of  tasks. 

In  this  paper,  we  adopt  the  notion  of  ordered  task 
decomposition  (Nau  et  al.  2003);  that  is,  the  tasks  in 
a  task  network  are  decomposed  into  subtasks  in  the 
order  they  are  supposed  to  be  performed  in  the  world. 
A  method  describes  a  possible  way  of  decomposing  the 
tasks  in  a  task  network  into  smaller  and  smaller  tasks. 
More  formally,  a  method  is  an  expression  of  the  form 
m  =  (t  C  w)  such  that  t  is  a  nonprimitive  task,  C 
is  a  conjunction  of  literals,  and  w  is  a  task  network 
that  denotes  the  subtasks  generated  by  decomposing  t 
by  m.  The  set  of  states  in  which  m  can  be  applied  is 
Sm  =  {s  |  s  £  S  and  C  holds  in  s}. 

Methods  describe  the  search-control  strategies  to  be 
used  in  a  domain.  As  an  example,  suppose  we  have  a 
task  chase_prey  in  the  Hunter-Prey  domain.  A  method 
for  this  task  can  be  defined  as  follows: 

(: method  (:task  (chase_prey) ) 

(: conditions  (prey_not_caught) 

(prey_to_the_north  hunter)) 

(: subtasks  (move_north  hunter) (chase_prey) ) ) 

This  method  is  applicable  only  in  the  states  in  which 
the  prey  has  not  been  caught  yet,  and  it  is  currently 
at  a  location  to  the  north  of  the  hunter  in  the  world. 
It  specifies  the  following  search-control  strategy:  if  the 
world  is  in  any  of  the  states  in  which  the  method  is 
applicable,  then  the  hunter  should  move  north  first  and 


Procedure  YoYo (D,  I,  G,  w,  M) 
return  YoyoAux(_D,  {(/,  w)},  G,  M,  0,  {(/,  w)}) 

Procedure  YoyoAux(D,  A',  G,  M,  n,  Xo) 

X  <—  PruneSituations(A',  G,  7t) 

if  there  is  a  situation  (S,w  =  nil)  £  X  such  that  S  G 
then  return  (failure) 

if  NoGoodPolicy(7r,  A,  G,  Ao)  then  return) failure) 
if  A'  =  0  then  return(7r) 

select  a  situation  (S,w)  from  A  and  remove  it 
F  <—  ComputeDecomposition(S,  w,  D,  M) 
if  F  =  0  then  return)  failure) 

X'  <—  ComputeSuccessors(_F,  A') 

7r'  <—  7r  U  {(s,  a)  |  (S',  a,  w')  £  F  and  s  £  S'} 

7r  <-  YoyoAux(Z),  A',  G,  M,  n  ,  A0) 
if  7t  =  failure  then  return(/aiiiire) 
return(7r) 


Figure  3:  YoYo,  an  HTN  planning  algorithm  for  gen¬ 
erating  solutions  in  nondeterministic  domains.  In  the 
YoyoAux  procedure  above,  X  is  the  current  set  of  sit¬ 
uations  and  Ao  is  the  initial  set  of  situations;  i.e. , 
X0  =  {)I,w)}. 

continue  chasing  prey  from  there. 

Let  s  be  a  state,  to  be  a  task  network,  and  t  be  a  task 
that  has  no  predecessors  in  w  -  i.e.,  t  can  be  decom¬ 
posed  into  smaller  tasks  by  the  semantics  of  ordered 
task  decomposition  since  there  is  no  task  before  t  that 
is  to  be  achieved.  Then  we  have  two  cases: 

•  t  is  a  primitive  task.  Then,  t  can  be  directly  executed 
in  s  —  i.e.,  t,  corresponds  to  an  action  in  A  if 
s  £  St-  The  result  of  that  application  is  the  set  of 
states  y(s,  t )  and  the  successor  task  network  w  \  {t}. 

•  t  is  a  nonprimitive  task.  Let  m  be  a  method  for  t  - 
i.e.,  to  =  ( t,C,w ').  Then,  to  can  be  used  to  decom¬ 
pose  t  in  s  if  s  £  Srn.  The  result  of  that  decomposi¬ 
tion  is  the  task  network  (w\{t})Uw'  such  that  every 
partial-ordering  constraint  in  both  w  \  {t}  and  w'  is 
satisfied  in  )w\{t})Uw'. 

The  YoYo  Planning  Algorithm 

In  this  section,  we  describe  YoYo,  a  forward-chaining 
HTN  planning  algorithm,  which  is  designed  to  com¬ 
bine  the  ability  of  exploiting  search-control  heuristics 
as  in  HTN  planning  with  symbolic  model-checking  tech¬ 
niques  in  a  single  planning  framework.  Figure  3  shows 
the  YoYo  planning  procedure  for  finding  solutions  for 
planning  problems  in  nondeterministic  domains. 

The  input  for  the  planning  procedure  YoYo  consists  of 
planning  problem  ( D ,  I,  G )  in  a  nondeterministic  plan¬ 
ning  domain  D  =  (V,  S,  A,  TV),  an  initial  task  network 
w,  and  a  set  of  HTN  methods  M  for  the  domain  D. 
The  algorithm  exploits  tuples  of  the  form  ( S ,  w),  called 
situations,  which  are  resolved  by  accomplishing  the  task 
network  w  in  the  states  of  S. 

Starting  with  the  initial  situation  (/,  w) ,  YoYo  recur¬ 
sively  generates  successive  sets  of  situations  until  a  so- 


Procedure  PruneSituations(A,  G,  n) 

A'  <-  0 

for  every  situation  ( S ,  w)  £  X 

S'  <-  S\  (GUSb) 

if  S’  +  0  then  A'  <-  A"  U  {(S',  w)} 

return  X' 


Figure  4:  The  PruneSituations  procedure. 


lution  for  the  given  planning  problem  is  generated.  At 
each  iteration  of  the  planning  process,  YoYo  first  checks 
the  set  X  of  current  situations  for  cycles  and  goal  states 
by  using  the  PruneSituations  procedure  shown  in  Fig¬ 
ure  4.  For  every  situation  ( S,w )  £  X,  PruneSituations 
checks  the  set  of  states  S  and  removes  any  state  that 
either  appears  already  in  the  policy  (and  therefore,  an 
action  has  already  been  planned  for  it),  or  appears  in 
the  set  of  goal  states  G  (and  therefore,  no  action  should 
be  planned  for  it).  As  a  result,  the  set  of  situations  re¬ 
turned  by  the  PruneSituations  procedure  are  truly  the 
situations  that  needs  to  be  explored  and  progressed  into 
successor  ones  in  the  search.  We  call  such  situations  as 
the  open  situations  of  the  current  search  trace. 

After  generating  the  open  situations  to  be  explored, 
YoYo  checks  if  there  is  an  open  situation  )S,  w )  such  that 
there  are  no  more  tasks  to  be  performed  in  w,  but  the 
goal  has  not  been  reached  yet  (i.e.,  S  %  G).  In  this  case, 
we  have  a  failure  in  the  search,  and  therefore,  YoYo  re¬ 
turns  failure  from  the  current  search  trace.  Otherwise, 
YoYo  uses  the  routine  NoGoodPolicy  to  further  check 
if  the  current  partial  policy  conforms  to  the  require¬ 
ments  of  the  kinds  of  solutions  it  is  looking  for.  The 
formal  definition  of  this  routine  depends  on  whether  we 
are  looking  for  strong  or  strong-cyclic  solutions,  so  we 
leave  the  discussion  on  this  routine  to  the  next  section. 

If  the  current  partial  policy  tt  does  not  meet  with 
the  requirements  of  being  a  solution  to  the  input  prob¬ 
lem,  then  YoYo  returns  from  the  current  search  trace 
by  failure.  Otherwise,  ir  is  a  solution  to  the  underlying 
planning  problem  if  there  are  no  open  situations  to  be 
explored  further.  This  is  true  since  tt  does  not  violate 
the  requirements  of  the  input  problem,  as  it  passed  the 
NoGoodPolicy  in  the  previous  step. 

Suppose  there  are  open  situations  to  explore  for  the 
planner.  Then  YoYo  selects  one  of  them,  say  ( S,w ), 
and  attempts  to  generate  an  action  for  every  state  in 
S.  The  ComputeDecomposition  routine,  which  is  basi¬ 
cally  an  HTN-planning  engine,  is  responsible  for  this 
operation,  as  follows.  In  a  situation  ( S,w ),  let  t  be  a 
task  that  has  no  predecessors  in  w.  If  t  is  a  primitive 
task  then  t  can  be  executed  directly  in  the  world.  Let 
a  be  an  action  that  corresponds  to  t ,  and  a  can  be  ap¬ 
plied  in  each  state  in  S;  i.e.,  we  have  S  C  Sa-  Note 
that  applying  an  action  a  in  a  set  of  states  S  does  not 
generate  any  new  open  situations:  we  require  that  S 
must  be  a  subset  of  Sa  because,  otherwise,  there  is  at 
least  one  state  in  S  for  which  no  action  is  applicable, 
and  this  is  a  failure  point  in  planning. 


Procedure  ComputeDecomposition(S',  w ,  D,  M) 

F^0;  Ir-{(S,w)} 

loop 

if  X  =  0  then  return(F) 
select  a  tuple  (S,w)  £  X  and  remove  it 
select  a  task  t  that  has  no  predecessors  in  w 
if  t  is  a  primitive  task  then 

actions  <—  {a  \  a  £  A  is  an  action  for  t,  and  S  C  ,Sa} 
if  actions  =  0  then  return  0 
select  an  action  a  from  actions 
F^Fu{(S,a,w\{t})} 
else 

methods  <—  { m  \  m  is  a  method  in  M  for  t 
and  S  Pi  Sm  j=-  0} 
if  methods  =  0  then  return  0 
select  a  method  instance  m  from  methods 
X  <-  X  U  {(5  n  Sm,  (w  \  {*})  U  w'} 
if  S  \  Sm  ^  0  then  X  ^IU{(S\  Sm,  w)} 


Figure  5:  The  ComputeDecomposition  procedure. 


Procedure  ComputeSuccessors(F,  X) 

X 1  <-  X  U  {(smcc(5')  a),  w)  \  (S,  a,  w)  £  F} 
X'  <—  {(Compose(w,X'),w)  \  (S,w)  £  X'} 

return  X' 


Figure  6:  The  ComputeSuccessors  procedure. 


If  t  is  not  primitive,  then  we  successively  apply  meth¬ 
ods  to  the  nonprimitive  tasks  in  w  until  an  action  is 
generated.  Suppose  we  chose  to  apply  a  method  m 
to  t.  This  generates  two  possible  situations:  (1)  the 
situation  that  arises  from  decomposing  t  by  m  in  the 
states  S  fl  Sm  in  which  m  is  applicable,  and  (2)  the 
situation  that  specifies  the  states  in  which  m  is  not  ap¬ 
plicable  -  i.e.,  the  situation  ( S  \  Sm,w).  In  the  former 
case,  we  proceed  with  decomposing  the  subtasks  of  t  as 
specified  in  m.  In  the  latter  case,  on  the  other  hand, 
other  methods  for  t  must  be  used.  Note  that  if  there 
are  no  other  methods  for  t  to  be  used  in  situations  like 
(S  \  Sm,w),  then  ComposeDecomposition  returns  the 
empty  set,  forcing  YoYo  to  correctly  report  a  failure. 

The  ComputeDecomposition  returns  a  set  F  of  the 
form  {(Si,  di,  Wi)}\ L0.  If  F  =  0  then  this  means  that  the 
decomposition  process  has  failed  since  there  is  a  state 
s  £  S  such  that  we  cannot  generate  an  action  for  s  by 
using  the  methods  provided  for  the  underlying  planning 
domain.  If  F  yf  0  then  the  routine  has  generated  an 
action  a*  for  each  state  in  S  -  i.e.,  we  have  S  =  (J}.  S', 
— ,  and  a  task  network  u\  to  be  accomplished  after 
applying  that  action. 

Suppose  ComputeDecomposition  returned  a  non¬ 
empty  set  F  of  tuples  of  the  form  ( S',a,w ').  Then, 
YoYo  proceeds  with  computing  the  successor  situations 
to  be  explored  using  the  ComputeSuccessors  routine  as 
follows:  for  each  tuple  (S' ,  a,  w')  £  F,  it  first  generates 
the  set  of  states  that  arises  from  applying  a  in  S'  by 
using  the  function 


where  1Z  is  the  state-transition  relation  for  the  underly¬ 
ing  planning  domain.  The  next  situation  corresponding 
this  action  application  is  defined  as  (succ(S' ,a),w'). 

Once  YoYo  generates  the  all  of  the  next  situations  be 
explored,  it  composes  the  newly-generated  situations 
with  respect  to  the  task  networks  they  specify  to  be 
accomplished.  More  formally,  the  Compose  function  of 
Figure  6  is  defined  as  follows: 

Compose(w,  X)  =  {s  |  (S,w)  £  X  and  s  £  S}. 

The  composition  of  a  set  of  situations  is  an  optimization 
step  in  the  planning  process.  The  progression  of  open 
situations  may  create  a  set  of  situations  in  which  more 
than  one  situation  may  specify  the  same  task  network. 
Composing  such  situations  is  not  required  for  correct¬ 
ness,  but  it  has  the  advantage  of  planning  with  more 
compact  BDD-based  representations. 

Strong  and  Strong-Cyclic  Planning 
using  YoYo 

The  abstract  planning  procedure  YoYo  can  be  used  for 
strong  and  strong-cyclic  planning  by  using  slightly  dif¬ 
ferent  routines  for  NoGoodPolicy,  which  specifies  the  dif¬ 
ferent  conditions  required  for  a  policy  to  be  a  strong  or 
a  strong-cyclic  solution  for  a  planning  problem.  In  this 
section,  we  discuss  the  definitions  for  these  routines. 

Strong  Planning.  In  strong  planning,  a  policy  must 
induce  an  execution  trace  to  a  goal  state  from  every 
state  that  is  reachable  from  the  initial  states  and  there 
should  be  no  cycles  in  the  execution  structure  induced 
by  that  policy.  This  condition  can  be  checked  as  follows: 

Procedure  NoGoodPolicy _Strong(7r,  X,  G,  Xo) 

S'  <-  0;  So  <-  StatesOf(Xo);  S^GU  StatesOf(X) 
while  S' 

S'  <-  S 

S  <—  S  U  (s'  |  (s' ,  a)  £  n,  and  7 (s',  a)  C  S'} 

tv  < —  7r  \  {(s,  a)  |  s  £  S  and  (s,  a)  £  tv} 
if  So  C  S  and  7r  =  0  then  return  FALSE 
return  true 

The  above  routine  is  built  on  the  strong  backward- 
preimage  function  of  (Cimatti  et  al.  2003).  Starting 
from  the  set  of  states  in  the  open  situations  and  the  goal 
states,  it  computes  the  set  of  states  in  the  policy  from 
which  an  open  state  or  a  goal  state  is  reachable.  While 
doing  so,  it  removes  the  state-action  pairs  for  those 
states  computed  by  the  strong  backward-preimage.  At 
the  end  of  this  process,  if  there  is  a  state-action  pair 
left  in  the  policy,  then  it  means  that  the  policy  induces 
a  cycle  in  the  execution  structure,  and  therefore,  it  can 
not  be  a  strong  solution  for  a  planning  problem. 

NoGoodPolicy  Throng  uses  a  subroutine  called  State- 
sOf,  which  returns  the  set  of  all  the  states  that  appear 
in  a  given  set  of  situations.  More  formally, 


succ(S’,a)  =  {s"  |  s'  £  S’  and  (s',a,s”)  £  7Z}, 


StatesOf(X)  =  {s  |  (S,w)  £  X  and  s  £  Sj 


Strong-Cyclic  Planning.  The  definition  for  the  No- 
Good  Pol  icy  check  for  strong-cyclic  planning  differs  from 
the  strong  case  only  in  the  way  that  the  backward  image 
is  computed.  In  particular,  in  the  strong-cylic  case,  we 
use  the  weak  backward-preimage,  instead  of  the  strong 
one.  This  way,  we  can  detect  only  those  cycles  induced 
by  the  input  policy  that  violate  the  “fairness  assump¬ 
tion”  of  strong-cyclic  planning  as  described  before. 

The  NoGood Policy  procedure  for  the  strong-cyclic 
planning  is  defined  as  follows: 


Procedure  NoGoodPolicy  _StrongCyclic(7r,  X,  G,  Ao) 

S'  <-  0;  So  <-  StatesOf(Xo);  S^G  U  StatesOf(X) 
while  S'  ^S 
S'  <-  S 

S  <—  S  U  {s'  |  (s' ,  a)  €  7r,  and  S  fl  7 (s’,  a)  ^  0} 
n  <—  n\{(s,a)  \  s  €  S  and  (s,  a)  €  n} 
if  So  C  S  and  7r  =  0  then  return  FALSE 
return  true 


Discussion.  Note  that  we  are  using  these  procedures 
to  verify  the  generated  policies  meet  the  requirements 
to  be  a  solution  for  the  underlying  planning  problems; 
we  are  not  using  them  for  generating  the  solution  poli¬ 
cies  themselves  as  in  (Cimatti  et  al.  2003)  since  that 
generation  is  performed  by  the  forward-chaining  HTN- 
based  search  engine  in  Y0Y0. 

BDD-based  Implementation  of  Our 
Algorithms 

We  have  implemented  a  prototype  of  the  Y0Y0  planning 
algorithm,  described  in  the  previous  section.  Our  cur¬ 
rent  implementation  is  built  on  both  the  ND-SHOP2 
and  the  MBP  planning  systems.  It  extends  the 
ND-SHOP2  planning  system  for  (1)  planning  over  sets  of 
states  rather  than  a  single  state,  and  (2)  implementing 
the  NoGoodPolicy  routine  as  a  part  of  its  backtracking 
search.  It  uses  an  interface  to  MBP  for  exploiting  the 
machinery  of  BDDs  implemented  in  it. 

In  this  section,  we  present  a  framework  that  enables 
us  to  implement  the  data  structures  of  the  Y0Y0  algo¬ 
rithm  and  its  helper  routines  using  BDD-based  sym¬ 
bolic  model-checking  primitives.  In  this  framework,  we 
use  the  same  machinery  to  represent  the  states  of  a 
planning  domain  as  in  (Cimatti  et  al.  2003).  This  ma¬ 
chinery  is  based  on  using  propositional  formulae  to  com¬ 
pactly  represent  sets  of  states  and  possible  transitions 
between  those  states  in  a  planning  domain. 

We  assume  a  vector  s  of  propositions  that  repre¬ 
sents  the  current  state  of  the  world.  For  example, 
in  the  Hunter-Prey  world  with  a  3  x  3  grid  and  one 
prey,  s  is  {hx  =  ()...., I  is  —  3 ,hy  =  0,...,hy  = 
3,px  =  0, . . .  ,px  =  3,py  =  0, . . .  ,py  =  3,  prey  -caught}. 
A  state  is  an  assignment  of  the  truth- values  {true, 
false}  to  each  proposition  in  s.  We  denote  such  an 
assignment  by  s(s). 

Based  on  this  formulation,  a  set  of  states  S  corre¬ 


sponds  to  the  formula  S(s)  such  that 
S(s)  =  \f  s(s). 

sGS 

This  definition  of  set  of  states  is  the  basis  of  our  frame¬ 
work  in  this  paper.  It  allows  us  to  define  YoYo’s  forward 
search  mechanism  over  BDD-based  representations  of 
sets  of  states,  rather  than  single  states. 

We  also  assume  another  vector  s'  of  propositional 
variables  to  represent  the  next  states  of  the  world,  re¬ 
spectively.  Similarly,  we  use  a  vector  a  of  action  vari¬ 
ables,  which  allows  representing  a  set  of  actions  at  the 
same  time.  A  policy  7 r,  which  is  a  set  of  state-action 
pairs,  can  be  represented  as  a  formula  7 r(s,  a)  in  the 
variables  s  and  a.  We  denote  a  set  of  states  S  with  a 
formula  S(s)  in  the  state  vector  s  as  before.  We  repre¬ 
sent  a  situation  as  a  pair  of  the  form  (S(s),w),  where  w 
is  a  task  network,  as  described  in  the  previous  section. 

The  initial  situation  can  be  represented  by  Xq  = 
{(/(s),mj)},  where  I(s)  represents  the  initial  set  of 
states  and  w  is  the  initial  task  network.  Similarly,  we 
represent  the  set  of  goal  states  with  the  formula  G(s). 
We  assume  the  existence  of  a  state-transition  relation 
R,  which  can  be  represented  as  R(s,a,s'),  where  s  de¬ 
notes  the  current  state  vector,  a  denotes  the  current 
action  vector,  and  s'  denotes  the  next  state  vector. 

The  formulations  of  the  inequality  of  sets,  set  dif¬ 
ference  operations,  and  subset  relations  constitute  the 
most  basic  primitives  used  in  conditionals  and  termina¬ 
tion  conditions  of  the  loops  of  our  algorithms.  These 
operations  can  be  easily  encoded  in  terms  of  basic  log¬ 
ical  operations  on  the  formulas  described  above. 

The  result  of  applying  an  action  a  in  a  set  of  states 
S  can  be  represented  as  the  formula: 

3s'  :  S(s)  A  R(s,  a,  s')  [s'/s], 

where  [s'/s]  is  called  the  forward- shifting  operation. 
Note  that  the  above  formula  represents  the  succ(S ,  a) 
function  described  in  the  previous  section. 

The  StatesOf  primitive  we  use  for  computing  the  set 
of  all  states  described  by  a  set  of  situations  can  be  rep¬ 
resented  as  a  set-union  operator  over  the  situations  we 
are  interested  in.  In  particular,  if  we  want  to  compute 
the  set  of  all  states  of  X  =  { xi,x% , . . . ,  xn},  then  this 
operations  corresponds  to  the  formula  S'i(s)  V  ^(s)  V 
. . .  V  Sn(s),  where  we  have  27  =  (Si,  Wi). 

We  are  now  ready  to  give  the  formulations  for  our 
algorithms.  The  PruneSituations  procedure  is  built  on 
set-difference  and  set-union  operations,  which  can  be 
represented  as  follows:  S(s)  A  ->(G(s )  V  3a  :  7r(s,  a)). 

The  NoGoodPolicy  procedures  for  strong  and  strong- 
cyclic  planning  are  based  on  two  primitives  for  com¬ 
puting  weak  and  strong  preimages  of  a  particular  set  of 
states.  These  preimage  computations  correspond  to 

3a3s'.7r(s,  a)  A  S(s')  and  3a3s'.7r(s,  a)  =7  S(s'), 
respectively. 

In  the  Com puteDecom position  routine,  we  check 
whether  a  method  or  an  action  is  applicable  in  a  given 


set  S  of  states  or  not.  This  check  corresponds  to  the 
following  formulas:  S(s )  =>  Sa(s)  and  S(s)  A  Sm(s), 
where  S(s)  represents  the  set  of  states  in  which  we  are 
performing  these  checks,  and  Sa(s)  and  Sm(s)  repre¬ 
sents  the  set  of  all  states  in  which  the  action  a  and  the 
method  m  is  applicable. 

Finally,  we  can  represent  the  update  of  a  policy  7r  by 
a  set  of  state-action  pairs  7r'  as  follows:  7 r(s,  a)  V7r'(s,  a). 

Experimental  Evaluation 

We  have  designed  three  sets  of  experiments  in  the 
Hunter-Prey  Domain,  described  earlier.  For  our  experi¬ 
ments,  we  assumed  that  the  domain  is  fully-observable 
in  the  sense  that  the  hunter  can  always  observe  the  loca¬ 
tion  of  the  prey.  We  also  assumed  that  the  hunter  moves 
first  in  the  world,  and  the  prey  moves  afterwards.  The 
nondeterminism  for  the  hunter  is  introduced  through 
the  movements  of  the  prey;  the  prey  may  take  any  of 
its  five  actions,  independent  from  the  hunter’s  move. 

In  our  experiments,  we  have  investigated  the  perfor¬ 
mances  of  YoYo,  ND-SH0P2,  and  MBP.  For  all  our  ex¬ 
periments,  we  used  a  HP  Pavilion  N5415  Laptop  with 
256MB  memory,  running  Linux  Fedora  Core  2.  We  set 
the  time  limit  for  the  planners  as  40  minutes.  In  our  ex¬ 
periments,  each  time  ND-SHOP2  and  MBP  had  a  mem¬ 
ory  overflow  or  they  could  not  solve  a  problem  within 
out  time  limit,  we  ran  them  again  on  another  problem 
of  the  same  size.  We  omitted  each  data  point  on  which 
this  happened  more  than  five  failures,  but  included  the 
data  points  where  it  happened  1  to  4  times. 

Experimental  Set  1.  In  these  experiments,  we 
aimed  to  investigate  how  well  YoYo  is  able  to  cope  with 
large-sized  problems  compared  to  ND-SHOP2  and  MBP. 
To  achieve  this  objective,  we  designed  experiments  on 
hunter-prey  problems  with  increasing  grid  sizes.  For 
these  problems,  we  assumed  there  is  only  one  prey  in 
the  world  in  order  to  keep  the  amount  of  nondetermin¬ 
ism  for  the  hunter  at  a  minimum. 

Figure  7  shows  the  results  of  the  experiments  for 
grid  sizes  n  =  5, 6,...,  10.  For  each  value  for  n,  we 
have  randomly  generated  20  problems  and  run  MBP, 
ND-SHOP2,  and  YoYo  on  those  problems.  In  this  fig¬ 
ure,  we  report  the  average  running  times  required  by 
the  planners  on  those  problems. 

For  grids  larger  than  n  =  10,  ND-SHOP2  was  not  able 
to  solve  the  planning  problems  due  to  memory  over¬ 
flows.  This  is  because  the  sizes  of  the  solutions  in  this 
domain  are  very  large,  and  therefore,  ND-SHOP2  runs 
out  of  memory  as  it  tries  to  store  them  explicitly.  Note 
that  this  domin  admits  only  high-level  search  strate¬ 
gies  such  as  ’’look  at  the  prey  and  move  towards  it.” 
Although  this  strategy  helps  the  planner  prune  a  por¬ 
tion  of  the  search  space,  such  pruning  alone  does  not 
compansate  for  the  explosion  in  the  size  of  the  explicit 
representations  of  the  solutions  for  the  problems. 

On  the  other  hand,  both  YoYo  and  MBP  was  able 
to  solve  all  of  the  problems  in  these  experiments. 


□  MBP  □  ND-SHOP2  ■Yoyo 


Figure  7:  Average  running  times  (in  sec.’s)  of  YoYo, 
ND-SHOP2,  and  MBP  in  the  hunter-prey  domain  as  a 
function  of  the  grid  size,  with  one  prey. 


Figure  8:  Average  running  times  in  sec.’s  for  YoYo  and 
MBP  on  some  larger  problems  in  the  hunter-prey  do¬ 
main  as  a  function  of  the  grid  size,  with  one  prey. 


The  difference  between  the  performances  of  YoYo  and 
ND-SHOP2  demonstrates  the  impact  of  the  use  of  BDD- 
based  representations:  YoYo,  using  the  same  HTN- 
based  heuristic  as  ND-SHOP2,  was  able  to  scale  up  as 
good  as  MBP  since  it  is  able  to  exploit  BDD-basecl  rep¬ 
resentations  of  the  problems  and  their  solutions. 

In  order  to  see  how  YoYo  performs  in  larger  prob¬ 
lems  compared  to  MBP,  we  have  also  experimented  with 
YoYo  and  MBP  in  much  larger  grids.  Figure  8  shows  the 
results  of  these  experiments  in  which,  using  the  same 
setup  as  above,  we  varied  the  size  of  the  grids  in  the 
planning  problems  as  n  =  5, 10, 15, ... ,  45, 50. 

These  results  show  that  YoYo  is  able  to  perform  bet¬ 
ter  than  MBP  with  the  increasing  grid  size.  The  run¬ 
ning  times  required  by  both  of  the  planners  increase  in 
larger  grids;  however,  this  increase  is  much  slower  for 
YoYo  than  MBP  as  shown  in  Figure  8  due  to  the  fol¬ 
lowing  reasons:  (1)  YoYo  is  able  to  combine  the  advan¬ 
tages  of  exploiting  HTN-based  search-control  heuristics 


Figure  9:  Average  running  times  in  sec.’s  of  ND-SH0P2, 
YoYo  and  MBP  on  problems  in  the  Hunter-Prey  domain 
as  a  function  of  the  number  of  preys,  with  a  4  x  4  grid. 
MBP  was  not  able  to  solve  planning  problems  with  5 
and  6  preys  within  40  minutes. 


with  the  advantages  of  using  BDD-based  representa¬ 
tions,  whereas  MBP  cannot  exploit  HTN-based  strate¬ 
gies  to  complement  its  BDD-based  planning  techniques; 
and  (2)  YoYo,  being  a  forward  planner,  considers  only 
those  states  that  are  reachable  from  the  initial  states 
of  the  planning  problems,  whereas  MBP’s  backward¬ 
chaining  algorithms  explore  states  that  are  not  reach¬ 
able  from  the  initial  states  of  the  problems  at  all. 

Experimental  Set  2.  In  order  to  investigate  the  ef¬ 
fect  of  combining  search-control  strategies  and  BDD- 
based  representations  in  YoYo,  we  used  the  following 
variation  of  the  Hunter-Prey  domain.  We  assumed  that 
we  have  more  than  one  prey  in  the  world,  and  the  prey 
i  cannot  move  to  any  location  within  the  neighbour¬ 
hood  of  prey  i  +  1  in  the  world.  In  such  a  setting,  the 
amount  of  nondeterminism  for  the  hunter  after  each  of 
its  move  increases  combinatorially  with  the  number  of 
preys  in  the  domain.  Furthermore,  the  BDD-based  rep¬ 
resentations  of  the  underlying  planning  domain  explode 
in  size  under  these  assumptions,  mainly  because  the 
movements  of  the  preys  are  dependent  to  each  other. 

In  this  adapted  domain,  we  used  a  search-control 
strategy  in  ND-SHOP2  and  YoYo  that  tells  the  planners 
to  chase  the  first  prey  until  it  is  caught,  then  the  second 
prey,  and  so  on,  until  all  of  the  preys  are  caught.  Note 
that  this  heuristic  allows  for  abstracting  away  from  the 
huge  state  space:  when  the  hunter  is  chasing  a  prey,  it 
does  not  need  to  know  the  locations  of  the  other  preys 
in  the  world,  and  therefore,  it  does  not  need  to  reason 
and  store  information  about  those  locations. 

In  the  experiments,  we  varied  the  number  of  preys 
from  p  =  2,. ..,6  in  a  4  x  4  grid  world.  We  have 
randomly  generated  20  problems  for  each  experiment 
with  different  number  of  preys.  Figure  9  shows  the 
results  of  these  experiments  with  MBP,  ND-SHOP2, 


and  YoYo.  These  results  demonstrate  the  power  of 
combining  HTN-based  search-control  heuristics  with 
BDD-based  representations  of  states  and  solutions  in 
our  planning  problems:  YoYo  was  able  to  outperform 
both  ND-SHOP2  and  MBP.  The  running  times  re¬ 
quired  by  MBP  grow  exponentially  faster  than  those 
required  by  YoYo  with  the  increasing  size  of  the  preys, 
since  MBP  cannot  exploit  HTN-based  heuristics.  Note 
that  ND-SHOP2  performs  much  better  than  MBP  in  the 
presence  of  good  search-control  heuristics. 

Experimental  Set  3.  In  order  to  further  investigate 
YoYo’s  performance  compared  to  that  of  ND-SHOP2 
and  MBP,  we  have  also  performed  an  extended  set  of  ex¬ 
periments  with  multiple  preys  and  with  increasing  grid 
sizes.  We  varied  the  number  of  preys  as  p  =  2, ...  ,6 
and  the  grid  sizes  n  =  3, 4, 5, 6.  As  before,  we  have  ran¬ 
domly  generated  20  problems  for  each  experiment  with 
different  p  and  n  combinations. 

Table  1  reports  the  average  running  times  required 
by  YoYo,  MBP,  and  ND-SHOP2  in  these  experiments. 
These  results  provide  further  proof  for  our  conclu¬ 
sions.  Search-control  heuristics  help  both  YoYo  and 
ND-SHOP2  as  they  both  outperform  MBP  with  the  in¬ 
creasing  number  of  the  preys.  However,  with  increasing 
grid  sizes,  ND-SHOP2  runs  into  memory  problems  as 
before  due  to  its  explicit  representations  of  states  and 
solutions  of  the  problems.  YoYo,  on  the  other  hand,  was 
able  to  cope  with  very  well  both  with  increasing  the  grid 
sizes  and  the  number  of  preys  in  these  problems. 

Discussion  on  the  Results.  Our  exprimental  re¬ 
sults  demonstrate  the  importance  of  using  HTN-based 
search-control  heuristics  and  BDD-based  representa¬ 
tions  in  a  single  forward-chaining  framework.  The 
search-control  heuristics  exploit  the  structure  of  the  un¬ 
derlying  planning  problems,  and  therefore,  they  result 
in  a  more  compact  and  structured  BDD  representations 
of  the  planning  problems  and  domains.  For  example, 
in  the  hunter-prey  domain,  the  strategy,  which  tells 
YoYo  to  focus  on  catching  one  prey  while  ignoring  other 
preys,  provides  a  combinatorial  reduction  in  the  repre¬ 
sentations  of  the  solutions  for  the  problems  and  the 
state-transition  relation  for  the  domain.  BDDs  provide 
even  further  compactness  in  those  reduced  representa¬ 
tions.  Note  that  the  same  strategy  did  not  work  for 
ND-SHOP2  very  well  in  large  problems  due  to  explicit 
representations  of  the  problems  and  the  domain.  Note 
also  that  BDD-based  representations  alone  did  not  work 
very  well  for  MBP  in  problems  with  increasing  number 
of  the  preys,  since  those  representations  are  not  suffi¬ 
cient  to  abstract  away  from  the  irrelevant  portions  of 
the  state  space.  YoYo,  on  the  other  hand,  was  able  to 
cope  very  well  with  problems  with  both  characteristics. 

Related  Work 

Over  the  years,  several  planning  techniques  have 
been  developed  for  planning  in  nondeterministic  do- 


Table  1:  Average  running  times  of  MBP,  ND-SH0P2, 
and  YoYo  on  Hunter-Prey  problems  with  increasing 
number  of  preys  and  increasing  grid  size. 


2  preys  j 

Grid 

MBP 

ND-SHOP2 

YoYo 

3x3 

0.343 

0.78 

0.142 

4x4 

0.388 

3.847 

0.278 

5x5 

1.387 

18.682 

0.441 

6x6 

3.172 

76.306 

0.551 

3  preys  j 

Grid 

MBP 

ND-SHOP2 

YoYo 

3x3 

1.1 

1.72 

0.329 

4x4 

11.534 

12.302 

0.521 

5x5 

133.185 

58.75 

0.92 

6x6 

368.166 

250.315 

1.404 

4  preys  1 

Grid 

MBP 

ND-SHOP2 

YoYo 

3x3 

29.554 

3.256 

0.448 

4x4 

492.334 

31.591 

0.759 

5x5 

>40  mins 

176.49 

1.818 

6x6 

>40  mins 

547.911 

3.295 

5  preys  j 

Grid 

MBP 

ND-SHOP2 

YoYo 

3x3 

233.028 

5.483 

0.655 

4x4 

>40  mins 

56.714 

1.275 

5x5 

>40  mins 

304.03 

3.028 

6x6 

>40  mins 

memory-overflow  7.059 

6  preys  | 

Grid 

MBP 

ND-SHOP2 

YoYo 

3x3 

2158.339 

8.346 

0.781 

4x4 

>40  mins 

73.435 

1.786 

5x5 

>40  mins 

486.112 

5.221 

6x6 

>40  mins 

memory-overflow  11.826 

mains.  Examples  include  satisfiability  and  planning- 
graph  based  techniques,  symbolic  model-checking  ap¬ 
proaches,  and  forward-chaining  heuristic  search. 

The  planning-graph  based  techniques  can  address 
conformant  planning,  where  the  planner  has  nonde- 
terministic  actions  and  no  observability,  and  a  lim¬ 
ited  form  of  partial-observability  (Smith  &  Weld  1998; 
Weld,  Anderson,  &  Smith  1998;  Brafman  &  Hoff¬ 
mann  2004).  To  the  best  of  our  knowledge,  exam¬ 
ples  for  the  nondeterministic  satisfiability-based  plan¬ 
ners  include  (Castellini,  Giunchiglia,  &  Tacchella  2003; 
Ferraris  &  Giunchiglia  2000)  on  conformant  planning, 
and  (Rintanen  1999)  on  conditional  planning. 

The  idea  of  using  symbolic  model-checking  (SMC)  to 
do  planning  in  nondeterministic  domains  was  first  intro¬ 
duced  in  (Cimatti  et  al.  1997;  Giunchiglia  &  Traverso 
1999;  Cimatti,  Roveri,  &  Traverso  1998).  (Cimatti 
et  al.  2003)  gives  a  full  formal  account  and  an  ex¬ 
tensive  experimental  evaluation  of  planning  for  these 
three  kinds  of  solutions.  Other  approaches  include 
(Jensen  &  Veloso  2000;  Jensen,  Veloso,  &  Bowling  2001; 
Rintanen  2002;  Jensen,  Veloso,  &  Bryant  2003).  SMC- 


planning  has  been  extended  to  deal  with  partial  observ¬ 
ability  (Bertoli  et  al.  2001b)  and  extended  goals  (Pi- 
store  &  Traverso  2001;  Dal  Lago,  Pistore,  &  Traverso 
2002).  The  MBP  planning  system  (Bertoli  et  al.  2001a) 
is  capable  of  handling  both. 

Planning  based  on  Markov  Decision  Processes 
(MDPs)  (Boutilier,  Dean,  &  Hanks  1999)  also  has  ac¬ 
tions  with  more  than  one  possible  outcome,  but  mod¬ 
els  the  possible  outcomes  using  probabilities  and  utility 
functions,  and  formulates  the  planning  problem  as  an 
optimization  problem.  For  problems  that  can  be  solved 
either  by  MDPs  or  by  model-checking-based  planners, 
the  latter  have  been  empirically  shown  to  be  more  effi¬ 
cient  (Bonet  &  Geffner  2001). 

(Kuter  &  Nau  2004)  presents  a  generalization  tech¬ 
nique  to  transport  the  efficiency  improvements  that  has 
been  achieved  for  forward-chaining  planning  in  deter¬ 
ministic  domains  over  to  nondeterministic  case.  Un¬ 
der  certain  conditions,  they  showed  that  a  “nondeter- 
minized”  algorithm’s  time  complexity  is  polynomially 
bounded  by  the  time  complexity  of  the  deterministic 
version.  ND-SHOP2  is  an  HTN  planner  developed  us¬ 
ing  this  technique  for  SHOP2  (Nau  et  al.  2003).  YoYo, 
our  HTN  planner  we  described  in  this  work,  is  built  on 
both  the  ND-SHOP2  and  the  MBP  planning  systems. 

In  YoYo,  we  only  focused  on  HTN-based  heuristics 
as  in  ND-SHOP2  and  combining  them  with  BDD-based 
representations  as  in  MBP.  However,  it  is  also  possible 
to  develop  variants  of  YoYo,  designed  to  work  with  other 
search-control  techniques  developed  for  forward  plan¬ 
ning,  such  as  temporal-logic  based  ones  as  in  (Bacchus 
&  Kabanza  2000;  Kvarnstrom  &  Doherty  2001).  In  our 
future  work,  we  intend  to  investigate  such  techniques  in 
YoYo  along  with  the  HTN-based  ones  we  developed  in 
this  work,  and  compare  the  advantages/disadvantages 
of  using  different  search-control  mechanisms. 

Conclusions 

This  paper  describes  a  new  algorithm  for  planning  in 
fully  observable  nondeterministic  domains.  This  algo¬ 
rithm  enables  us  to  combine  the  search-control  ability 
of  HTN  planning  with  the  state-abstraction  ability  of 
BDD-based  symbolic  model-checking.  Our  experimen¬ 
tal  evaluation  shows  that  the  combination  is  a  potent 
one:  it  has  large  advantages  in  speed,  memory  usage, 
and  scalability. 

In  the  future,  we  plan  to  extend  the  comparison  to 
other  domains,  to  further  confirm  our  hypothesis  on 
the  benefits  of  the  proposed  approach.  We  plan  also  to 
devise  algorithms  that  further  integrate  symbolic  model 
checking  and  HTNs,  by  combining  HTN-based  forward 
search  with  MBP’s  backward-search  algorithms  that  are 
based  on  symbolic  model-checking. 
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